From Magic to Mixed Feelings: Analyzing ‘One Hundred Years of Solitude’ Reviews

How readers experience the novel: A deep dive into emotional responses, writing complexity, and thematic connections across different rating categories.

SWDchallenge

Data Visualization

R Programming

2025

A comprehensive analysis of reader reviews for ‘One Hundred Years of Solitude’, examining emotional patterns, writing complexity, and common themes through data visualization of Goodreads and LibraryThing reviews.

Author

Steven Ponce

Published

January 5, 2025

Modified

May 22, 2025

Figure 1: Data visualization analyzing reviews of One Hundred Years of Solitude with four plots: (1) Distribution of emotional content by rating category, showing positive emotions dominating higher ratings; (2) Emotional flow through reviews, illustrating a mix of joy, trust, and sadness across the text; (3) Review complexity by rating, indicating longer sentences in positive reviews; (4) Common word pairs in reviews, highlighting frequent terms such as ‘family’, ‘buendía’, and ‘realism’.

Update: This post has been updated based on feedback from the #SWDchallenge community. The changes include: - Fixed the chart legends that were inadvertently left out during one iteration.

Steps to Create this Graphic

1. Load Packages & Setup

Show code

if (!require("pacman")) install.packages("pacman")
pacman::p_load(
  tidyverse,         # Easily Install and Load the 'Tidyverse'
  ggtext,            # Improved Text Rendering Support for 'ggplot2'
  showtext,          # Using Fonts More Easily in R Graphs
  scales,            # Scale Functions for Visualization
  glue,              # Interpreted String Literals
  here,              # A Simpler Way to Find Your Files
  janitor,           # Simple Tools for Examining and Cleaning Dirty Data
  skimr,             # Compact and Flexible Summaries of Data
  camcorder,         # Record Your Plot History
  textcat,           # N-Gram Based Text Categorization
  ggdist,            # Visualizations of Distributions and Uncertainty # Visualizations of Distributions and Uncertainty # Visualizations of Distributions and Uncertainty
  tidytext,          # Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools # Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools # Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools
  patchwork          # The Composer of Plots # The Composer of Plots # The Composer of Plots
) 

# Source utility functions
suppressMessages(source(here::here("R/utils/fonts.R")))
source(here::here("R/utils/social_icons.R"))
source(here::here("R/utils/image_utils.R"))
source(here::here("R/themes/base_theme.R"))

2. Read in the Data

Show code

goodreads <- read_csv(
  here::here("data/goodreads_reviews_full.csv"))
  
librarything <- read_csv(
  here::here("data/librarything_reviews_full.csv"))

# Combine the datasets
combined_reviews <- bind_rows(goodreads, librarything)

3. Examine the Data

Show code

glimpse(goodreads)
glimpse(librarything)
glimpse(combined_reviews)

4. Tidy Data

Show code

combined_reviews_clean <- combined_reviews |>
  # Combine 'star_rating' and 'numeric_rating' into a single 'rating' column
  mutate(rating = coalesce(star_rating, numeric_rating)) |>
  # Convert 'review_date' to Date format
  mutate(review_date = lubridate::mdy(review_date)) |>
  # Standardize column names
  rename(
    reviewer = reviewer_name,
    date = review_date,
    text = review_text
  ) |>
  # Clean up review text  
  mutate(
    text = str_squish(text), # Remove extra whitespace
    text = tolower(text),    # Convert to lowercase
    text = str_replace_all(text, "[^a-zA-Z0-9 .,!?']", "") # Remove special characters
  ) |>
  # Select and reorder columns
  select(reviewer, date, rating, text, source) |>
  # Remove duplicate rows
  distinct() |> 
  mutate(
    language = textcat(text),             # Add detected language as a new column
    word_count = str_count(text, "\\S+")  # Count words in text
    ) |>  
  filter(language == "english")           # Keep only English reviews


# Housekeeping
rm(goodreads, librarything, combined_reviews)


# Prepare text data for sentiment analysis
review_sentiments <- combined_reviews_clean |>
  unnest_tokens(word, text) |>
  anti_join(stop_words) |>
  inner_join(get_sentiments("nrc")) |>
  # Add rating categories for comparison
  mutate(rating_category = case_when(
    rating <= 2 ~ "Negative (1-2)",
    rating == 3 ~ "Neutral (3)",
    rating >= 4 ~ "Positive (4-5)"
  ))

# 1. Revised Complexity Analysis
complexity_analysis <- combined_reviews_clean |>    
  mutate(
    sentences = str_count(text, "[.!?]+"),
    words_per_sentence = word_count / sentences,
    rating_category = factor(case_when(
      rating <= 2 ~ "Negative (1-2)",
      rating == 3 ~ "Neutral (3)",
      rating >= 4 ~ "Positive (4-5)"
    ), levels = c("Negative (1-2)", "Neutral (3)", "Positive (4-5)"))
  ) |>
  filter(is.finite(words_per_sentence))

# 2. Sentiment Flow (keeping existing structure, updating colors)
sentiment_flow <- review_sentiments |>
  mutate(
    theme = case_when(
      sentiment %in% c("joy", "trust", "anticipation") ~ "positive",
      sentiment %in% c("anger", "fear", "disgust") ~ "negative",
      TRUE ~ "neutral"
    )
  ) |>
  count(rating_category, theme) |>
  group_by(rating_category) |>
  mutate(prop = n/sum(n)) |>
  ungroup()

# 3. Temporal Pattern (keeping existing structure)
temporal_pattern <- review_sentiments |>
  group_by(reviewer) |>
  mutate(
    position = row_number(),
    position_pct = position/n()
  ) |>
  count(position_pct = round(position_pct, 2), sentiment) |> 
  ungroup()

# 4. Simplified Bigram Network
bigram_graph <- combined_reviews_clean |>
  unnest_tokens(bigram, text, token = "ngrams", n = 2) |>
  separate(bigram, c("word1", "word2"), sep = " ") |>
  filter(
    !word1 %in% stop_words$word,
    !word2 %in% stop_words$word,
    !is.na(word1),
    !is.na(word2)
  ) |>
  count(word1, word2, sort = TRUE) |>
  filter(n >= 4) |>  # Increased threshold
  slice_head(n = 15)  # Take only top 15 pairs

5. Visualization Parameters

Show code

### |-  plot aesthetics ----
# Get base colors with custom palette
colors <- get_theme_colors(palette = c("negative" = "#E69B95", "neutral"  = "#709BB0", "positive" = "#86B8B1"))

### |-  titles and caption ----
title_text   <- str_glue("From Magic to Mixed Feelings: Analyzing 'One Hundred Years of Solitude' Reviews") 

subtitle_text <- str_glue(
  "How readers experience the novel: A deep dive into emotional responses, writing complexity, and thematic\n
connections across different rating categories",
  
  "\n\n**Note**: This analysis is based on a small sample of 42 reviews, collected from Goodreads and LibraryThing\n
as of January 3, 2025.")

# Create caption
caption_text <- create_swd_caption(
  year = 2025,
  month = "Jan",
  source_text = "Source: Scrapped from goodreads & librarthing"
)


# |- fonts ----
setup_fonts()
fonts <- get_font_families()

### |-  plot theme ----
# Start with base theme
base_theme <- create_base_theme(colors)
            

# Add weekly-specific theme elements
weekly_theme <- extend_weekly_theme(
    base_theme,
    theme(
      plot.margin         = margin(t = 10, r = 20, b = 10, l = 20),
      axis.title.x        = element_text(margin = margin(10, 0, 0, 0), size = rel(1.1), 
                                         color = colors$text, family = fonts$text, face = "bold", hjust = 0.5),
      axis.title.y        = element_text(margin = margin(0, 10, 0, 0), size = rel(1.1), 
                                         color = colors$text, family = fonts$text, face = "bold", hjust = 0.5),
      axis.text           = element_text(size = rel(0.8), color = colors$text),
      axis.line.x         = element_line(color = "#252525", linewidth = .3),
      axis.ticks.x        = element_line(color = colors$text),  
      axis.title          = element_text(face = "bold"),
      panel.grid.minor    = element_blank(),
      panel.grid.major    = element_blank(),
      panel.grid.major.y  = element_line(color = "grey85", linewidth = .4)
      )
)
      

# Set theme
theme_set(weekly_theme)

6. Plot

Show code

# 1. Sentiment Flow Plot
p1 <- sentiment_flow |>  
  ggplot(aes(x = rating_category, y = prop, fill = theme)) +
  geom_col(position = "fill", alpha = 0.9) +
  scale_fill_manual(values = colors$palette) +
  scale_y_continuous(labels = scales::percent) +
  labs(
    title = "<b>Distribution of Emotional Content by Rating</b>",
    fill = "Emotional Theme",
    x = "Rating Category",
    y = "Proportion of Emotions",
  ) +
  theme_minimal() +
  theme(
    plot.title = element_markdown(size = rel(1)),
    legend.position = "right",
    plot.margin = margin(t = 10, r = 10, b = 20, l = 10)
  )

# 2. Temporal Pattern Plot
p2 <- temporal_pattern |> 
  ggplot(aes(x = position_pct, y = n, fill = sentiment)) +
  geom_area(position = "fill", alpha = 0.7) +
  scale_fill_brewer(palette = "RdYlBu") +
  scale_x_continuous(
    labels = scales::percent,
    breaks = c(0, 0.25, 0.5, 0.75, 1),
    expand = c(0, 0)
  ) +
  scale_y_continuous(
    labels = scales::percent,
    expand = c(0, 0)
  ) +
  labs(
    title = "<b>Emotional Flow Through Reviews</b>",
    x = "Relative Position in Review",
    y = "Proportion of Emotions",
    fill = "Emotion"
  ) +
  theme_minimal() +
  theme(
   plot.title = element_markdown(size = rel(1)),
    legend.position = "right",
    panel.grid.minor = element_blank(),
    plot.margin = margin(t = 10, r = 10, b = 20, l = 10)
  )

# 3. Complexity Analysis Plot
p3 <- complexity_analysis |> 
  ggplot(aes(x = words_per_sentence, y = rating_category, fill = rating_category)) +
  stat_gradientinterval(
    aes(color = after_scale(fill)), 
    point_size = 1.2,
    alpha = 0.3,
    point_alpha = 0.7
  ) +
  scale_fill_manual(
    values = c(
      "Negative (1-2)" = "#E69B95",
      "Neutral (3)"    = "#709BB0",
      "Positive (4-5)" = "#86B8B1"
    )
  ) +
  labs(
    title = "<b>Review Complexity by Rating</b>",
    x = "Words per Sentence",
    y = NULL,
    fill = "Rating Category"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_markdown(size = rel(1)),
    legend.position = "right",
    panel.grid.major.y = element_blank(),
    plot.margin = margin(t = 10, r = 10, b = 20, l = 10)
  )

# 4. Bigram Network Plot
p4 <- bigram_graph |> 
  ggplot(aes(x = word1, y = word2)) +
  geom_point(aes(size = n), color = colors$palette["neutral"], alpha = 0.7) +
  scale_size_continuous(range = c(2, 6)) +
  labs(
    title = "<b>Common Word Pairs in Reviews</b>",
    size = "Frequency"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_markdown(size = rel(1)),
    legend.position = "right",
    axis.text.x = element_text(hjust = 1),
    panel.grid = element_line(color = "grey90"),
    plot.margin = margin(t = 10, r = 10, b = 20, l = 10)
  )

# Combine plots 
combined_plots <- (p1 + p2) /
  (p3 + p4) +
  plot_annotation(
    title = title_text,
    subtitle = subtitle_text,
    caption = caption_text,
    theme = theme(
      plot.title = element_markdown(
        family = "title",
        face = "bold", 
        size = rel(1.7),
        color = colors$title,
        margin = margin(b = 10)
      ),
      plot.subtitle = element_markdown(
        family = "subtitle",
        size = rel(1.1),
        color = colors$subtitle, 
        margin = margin(b = 20),
        lineheight = 1.1
      ),
      plot.caption = element_markdown(
        family = "caption",
        size = 10, 
        color = colors$caption,
        margin = margin(t = 20),
        hjust = 0.5,
        lineheight = 1.2
      )
    )
  ) &
  theme(plot.background = element_rect(fill =colors$background, color = NA))

7. Save

Show code

### |-  plot image ----  

source(here::here("R/image_utils.R"))
save_plot_patchwork(combined_plots, type = 'swd', year = 2025, month = 01, 
                    width = 12, height = 12)

8. Session Info

Expand for Session Info

R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/La_Paz
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] patchwork_1.3.0 tidytext_0.4.2  ggdist_3.3.2    textcat_1.0-9  
 [5] camcorder_0.1.0 skimr_2.1.5     janitor_2.2.0   here_1.0.1     
 [9] glue_1.8.0      scales_1.3.0    showtext_0.9-7  showtextdb_3.0 
[13] sysfonts_0.8.9  ggtext_0.1.2    lubridate_1.9.3 forcats_1.0.0  
[17] stringr_1.5.1   dplyr_1.1.4     purrr_1.0.2     readr_2.1.5    
[21] tidyr_1.3.1     tibble_3.2.1    ggplot2_3.5.1   tidyverse_2.0.0
[25] pacman_0.5.1   

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1     farver_2.1.2         fastmap_1.2.0       
 [4] janeaustenr_1.0.0    digest_0.6.37        timechange_0.3.0    
 [7] lifecycle_1.0.4      rsvg_2.6.1           tokenizers_0.3.0    
[10] magrittr_2.0.3       compiler_4.4.0       rlang_1.1.4         
[13] tools_4.4.0          utf8_1.2.4           yaml_2.3.10         
[16] knitr_1.49           labeling_0.4.3       htmlwidgets_1.6.4   
[19] curl_6.0.0           bit_4.5.0            RColorBrewer_1.1-3  
[22] xml2_1.3.6           repr_1.1.7           withr_3.0.2         
[25] grid_4.4.0           fansi_1.0.6          colorspace_2.1-1    
[28] cli_3.6.3            rmarkdown_2.29       crayon_1.5.3        
[31] generics_0.1.3       rstudioapi_0.17.1    textdata_0.4.5      
[34] tzdb_0.4.0           commonmark_1.9.2     parallel_4.4.0      
[37] ggplotify_0.1.2      yulab.utils_0.1.8    base64enc_0.1-3     
[40] vctrs_0.6.5          Matrix_1.7-0         jsonlite_1.8.9      
[43] slam_0.1-55          gridGraphics_0.5-1   hms_1.1.3           
[46] bit64_4.5.2          systemfonts_1.1.0    magick_2.8.5        
[49] gifski_1.32.0-1      codetools_0.2-20     distributional_0.5.0
[52] stringi_1.8.4        gtable_0.3.6         munsell_0.5.1       
[55] pillar_1.9.0         rappdirs_0.3.3       htmltools_0.5.8.1   
[58] R6_2.5.1             rprojroot_2.0.4      vroom_1.6.5         
[61] evaluate_1.0.1       lattice_0.22-6       markdown_1.13       
[64] SnowballC_0.7.1      gridtext_0.1.5       tau_0.0-26          
[67] snakecase_0.11.1     renv_1.0.3           Rcpp_1.0.13-1       
[70] svglite_2.1.3        xfun_0.49            fs_1.6.5            
[73] pkgconfig_2.0.3

9. GitHub Repository

Expand for GitHub Repo

The complete code for this analysis is available in swd_2025_01.qmd. For the full repository, click here.

10. References

Expand for References

The web scraping scripts used to collect the review data: - Goodreads: goodreads_web_scraping.R - LibraryThing: librarything_web_scraping.R

Data Sources: - Goodreads: One Hundred Years of Solitude Reviews - LibraryThing: One Hundred Years of Solitude Reviews